Integrating Fine-Grained Message Passing in Cache Coherent Shared Memory Multiprocessors

نویسندگان

  • David K. Poulsen
  • Pen-Chung Yew
چکیده

This paper considers the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency caused by interprocessor communication in cache coherent, shared memory multiprocessors. Data prefetching is accomplished by using a multiprocessor software pipelined algorithm. Data forwarding is used to target interprocessor data communication, rather than synchronization, and is applied to communication-related accesses between successive parallel loops. Prefetching and forwarding are each shown to be more effective for certain types of architectural and application characteristics. Given this result, a new hybrid prefetching and forwarding approach is proposed and evaluated that allows the relative amounts of prefetching and forwarding used to be adapted to these characteristics. When compared to prefetching or forwarding alone, the new hybrid scheme is shown to increase performance stability over varying application characteristics, to reduce processor instruction overheads, cache miss ratios, and memory system bandwidth requirements, and to reduce performance sensitivity to architectural parameters such as cache size. Algorithms for data prefetching, data forwarding, and hybrid prefetching and forwarding are described. These algorithms are applied by using a parallelizing compiler and are evaluated via execution-driven simulations of large, optimized, numerical application codes with loop-level and vector parallelism.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Parallelization for Non-cache Coherent Multiprocessors

Although much work has been done on parallelizing compilers for cache coherent shared memory multiprocessors and message-passing multiprocessors, there is relatively little research on parallelizing compilers for noncache coherent multiprocessors with global address space. In this paper, we present a preliminary study on automatic parallelization for the Cray T3D, a commercial scalable machine ...

متن کامل

Eeect of Virtual Channels and Memory Organization on Cache-coherent Shared-memory Multiprocessors

In this paper, performance of wormhole routed 2-D torus network with virtual channels has been evaluated for cache-coherent shared-memory multiprocessors with execution-driven simulation using various applications. The traac in such systems is very diierent from the traac in message-passing environment and is characterized by traac bursts, one-to-many and many-to-one traac, and small xed length...

متن کامل

Integrating Multiple Communication Paradigms in High Performance Multiprocessors

In the design of FLASH, the successor to the Stanford DASH multiprocessor, we are exploring architectural mechanisms for efficiently supporting both the shared memory and message passing communication models in a single system. The unique feature in the FLASH (FLexible Architecture for SHared memory) system is the use of a programmable controller at each node that replaces the functionality of ...

متن کامل

Programming FFT on DSM Multiprocessors

The performance of the shared address space programming model for the kinds of coarse-grained communicating programs , which have traditionally been common in scientific computing, is not clear today. In this paper, we use the challenging 1-dimensional FFT, a regular coarse-grained program, as our driving application to study how to get high performance for such kind of applications under the s...

متن کامل

Execution Based Evaluation of MINs for Cache-Coherent Multiprocessors

In this paper, performance of multistage interconnection network with wormhole routing and packet switching has been evaluated for cache-coherent shared-memory multiproces-sors. The evaluation is based on execution-driven simulation using various applications. The traac in cache-coherent systems is very diierent from the traac in message-passing environment and is characterized by traac bursts,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 33  شماره 

صفحات  -

تاریخ انتشار 1996